Search CORE

5 research outputs found

Probabilistic models for data efficient reinforcement learning

Author: Kamthe Sanket
Publication venue: Computing, Imperial College London
Publication date: 01/11/2021
Field of study

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the standard deep learning methods often overlook the progress made in control theory by treating systems as black-box. We propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach not only achieves the state-of-the-art data efficiency, but also is a principled way for RL in constrained environments. When the true state of the dynamical system cannot be fully observed the standard model based methods cannot be directly applied. For these systems an additional step of state estimation is needed. We propose distributed message passing for state estimation in non-linear dynamical systems. In particular, we propose to use expectation propagation (EP) to iteratively refine the state estimate, i.e., the Gaussian posterior distribution on the latent state. We show two things: (a) Classical Rauch-Tung-Striebel (RTS) smoothers, such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS), are special cases of our message passing scheme; (b) running the message passing scheme more than once can lead to significant improvements over the classical RTS smoothers. We show the explicit connection between message passing with EP and well-known RTS smoothers and provide a practical implementation of the suggested algorithm. Furthermore, we address convergence issues of EP by generalising this framework to damped updates and the consideration of general -divergences. Probabilistic models can also be used to generate synthetic data. In model based RL we use ’synthetic’ data as a proxy to real environments and in order to achieve high data efficiency. The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited as in RL or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret. Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables. Moreover, the loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is equivalent to estimating the density of the data. Based on the copula theory, we divide the density estimation task into two parts, i.e., estimating univariate marginals and estimating the multivariate copula density over the univariate marginals. We use normalising flows to learn both the copula density and univariate marginals. We benchmark our method on both simulated and real data-sets in terms of density estimation as well as the ability to generate high-fidelity synthetic data.Open Acces

Spiral - Imperial College Digital Repository

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Author: Deisenroth Marc Peter
Kamthe Sanket
Publication venue
Publication date: 08/01/2018
Field of study

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.Comment: Accepted at AISTATS 2018

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Iterative State Estimation in Non-linear Dynamical Systems Using Approximate Expectation Propagation

Author: Deisenroth Marc
Kamthe Sanket
Mohamed Shakir
Takao So
Publication venue
Publication date: 01/07/2022
Field of study

Bayesian inference in non-linear dynamical systems seeks to find good posterior approximations of a latent state given a sequence of observations. Gaussian filters and smoothers, including the (extended/unscented) Kalman filter/smoother, which are commonly used in engineering applications, yield Gaussian posteriors on the latent state. While they are computationally efficient, they are often criticised for their crude approximation of the posterior state distribution. In this paper, we address this criticism by proposing a message passing scheme for iterative state estimation in non-linear dynamical systems, which yields more informative (Gaussian) posteriors on the latent states. Our message passing scheme is based on expectation propagation (EP). We prove that classical Rauch--Tung--Striebel (RTS) smoothers, such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS), are special cases of our message passing scheme. Running the message passing scheme more than once can lead to significant improvements of the classical RTS smoothers, so that more informative state estimates can be obtained. We address potential convergence issues of EP by generalising our state estimation framework to damped updates and the consideration of general alpha-divergences

UCL Discovery

Die Veröffentlichung steht unter folgender Creative Commons Lizenz:

Author: Bearbeitung Deutschl
Kamthe Pune
Namensnennung Keine
Nutzung Keine
Sanket Kamthe Pune
Tag Der Einreichung
Publication venue
Publication date
Field of study

Hiermit versichere ich, die vorliegende Master-Thesis ohne Hilfe Dritter nur mit den angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus Quellen entnommen wurden, sind als solche kenntlich gemacht. Diese Arbeit hat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen. Darmstadt, den January 31, 2014 (Sanket Kamthe) Multi-modal densities appear frequently in time series and practical applications. However, they cannot be represented by common state estimators, such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF), which additionally suffer from the fact that uncertainty is often not captured sufficiently well, which can result in incoherent and divergent tracking performance. In this thesis, we address these issues by devising a non-linear filtering algorithm where densities are represented by Gaussian mixture models, whose parameters are estimated in closed form. The filtered results can be further improved by a backward pass or smoothing. However, the optimal backward filter does not offer a closed form solutio

CiteSeerX